Improving the Computational Morphological Analysis of a Swahili Corpus for Lexicographic Purposes

نویسندگان

  • Guy De Pauw
  • Gilles-Maurice de Schryver
چکیده

Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A heuristic for morpheme discovery based on string edit distance

This paper derives from work we have been doing on unsupervised learning of the morphology of languages with rich morphologies, that is, with a high average number of morphemes per word. Our focus in this paper is Swahili, a major Bantu language of East Africa, and our goal is the development of a system that can automatically produce a morphological analyzer of a text on the basis of a large c...

متن کامل

Using Syllables as Features in Morpheme Tagging in Swahili

Utilizing corpora to build morphological analyzers for the purposes of computational application has been addressed in many different ways. Methods for automated morphological analysis generally focus on segmentation from raw text, and ignore the actual learning of what morpheme features are present. Other methods are time-consuming and require a great deal of prior knowledge of the language su...

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

Language Resources Development at the Spanish Royal Academy

$EVWUDFW This paper explains some of the most relevant issues concerning the development of language resources at the Spanish Royal Academy. Two 125-M words corpus of Spanish language (synchronic and diachronic) and three specialized corpus has been developed. Around the corpus, RAE is also developing NLP tools and resources to morpho-syntactically annotate them. Some of the most relevant are: ...

متن کامل

Disambiguation of morphological analysis in Bantu languages

The paper describes problems in disambiguating the morphological analysis of Bantu languages by using Swahili as a test language. The main factors of ambiguity in this language group can be traced to the noun class structure on one hand and to the bi-directional word-formation on the other. In analyzing word-forms, the system applied utilizes SWATWOL, a morphological parsing program based on tw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008